Upgrading to dbt utils v1.0
For the first time, dbt utils is crossing the major version boundary. From last month’s blog post:
It’s time to formalize what was already unofficial policy: you can rely on dbt utils in the same way as you do dbt Core, with stable interfaces and consistent and intuitive naming.
Just like the switch to dbt Core 1.0 last year, there are some breaking changes as we standardized and prepared for the future. Most changes can be handled with find-and-replace. If you need help, post on the Community Forum or in #package-ecosystem channel on Slack.
New features
get_single_value()
— An easy way to pull a single value from a SQL query, instead of having to access the[0][0]
th element of arun_query
result.safe_divide()
— Returns null when the denominator is 0, instead of throwing a divide-by-zero error.- New
not_empty_string
test — An easier wrapper than usingexpression_is_true
to check the length of a column.
Enhancements
- Many tests are more meaningful when you run them against subgroups of a table. For example, you may need to validate that recent data exists for every turnstile instead of a single data source being sufficient. Add the new
group_by_columns
argument to your tests to do so. Review this article by the test's author for more information. - With the addition of an on-by-default
quote_identifiers
argument in thestar()
macro, you can now disable quoting if necessary. - The
recency
test now has an optionalignore_time_component
argument which can be used when testing against a date column. This prevents the time of day the test runs from causing false negatives/positives.
Fixes
union()
now includes/excludes columns case-insensitivelyslugify()
prefixes an underscore when the first char is a digit- The
expression_is_true
test doesn’t output*
unless storing failures, a cost improvement for BigQuery.
Breaking Changes
Changes to surrogate_key()
:
surrogate_key()
has been replaced bygenerate_surrogate_key()
. The original treated null values and blank strings the same, which could lead to duplicate keys being created.generate_surrogate_key()
does not have this flaw. Compare the surrogate keys calculated for these columns:
Changing the calculation method for surrogate keys, even for the better, could have significant consequences in downstream uses (such as snapshots and incremental models which use this column as their unique_key
). As a result, it's possible to opt into the legacy behavior by setting the following variable in your dbt project:
#dbt_project.yml
vars:
surrogate_key_treat_nulls_as_empty_strings: true #turn on legacy behavior
By creating a new macro instead of updating the behavior of the old one, we are requiring all projects who use this macro to make an explicit decision about which approach is better for their context.
Our recommendation is that existing users should opt into the legacy behavior unless you are confident that either:
- your surrogate keys never contained nulls, or
- your surrogate keys are not used for incremental models, snapshots or other stateful artifacts and so can be regenerated with new values without issue.
You can not assume one behavior or the other, as each project can customize its behavior.
Functionality now native to dbt Core:
- The
expression_is_true
test no longer has a dedicatedcondition
argument. Instead, usewhere
which is now available natively to all tests:
version: 2
models:
- name: old_syntax
tests:
- dbt_utils.expression_is_true:
expression: "col_a + col_b = total"
#replace this...
condition: "created_at > '2018-12-31'"
- name: new_syntax
tests:
- dbt_utils.expression_is_true:
expression: "col_a + col_b = total"
# ...with this...
where: "created_at > '2018-12-31'"
Note — This may cause some tests to get the same autogenerated names. To resolve this, you can define a custom name for a test.
- The deprecated
unique_where
andnot_null_where
tests have been removed, because where is now available natively to all tests. To migrate, find and replacedbt_utils.unique_where
withunique
anddbt_utils.not_null_where
withnot_null
. dbt_utils.current_timestamp()
is replaced bydbt.current_timestamp()
.- Note that Postgres and Snowflake’s implementation of
dbt.current_timestamp()
differs from the olddbt_utils
one (full details here). If you use Postgres or Snowflake and need identical backwards-compatible behavior, usedbt.current_timestamp_backcompat()
. This discrepancy will hopefully be reconciled in a future version of dbt Core.
- Note that Postgres and Snowflake’s implementation of
- All other cross-db macros have moved to the dbt namespace, with no changes necessary other than replacing
dbt_utils.
withdbt.
. Review the cross database macros documentation for the full list.- In your code editor, you can do a global find and replace with regex:
\{\{\s*dbt_utils\.(any_value|bool_or|cast_bool_to_text|concat|dateadd|datediff|date_trunc|escape_single_quotes|except|hash|intersect|last_day|length|listagg|position|replace|right|safe_cast|split_part|string_literal|type_bigint|type_float|type_int|type_numeric|type_string|type_timestamp|type_bigint|type_float|type_int|type_numeric|type_string|type_timestamp|except|intersect|concat|hash|length|position|replace|right|split_part|escape_single_quotes|string_literal|any_value|bool_or|listagg|cast_bool_to_text|safe_cast|dateadd|datediff|date_trunc|last_day)
→{{ dbt.$1
- In your code editor, you can do a global find and replace with regex:
Removal of insert_by_period
materialization
- The
insert_by_period
materialization has been moved to the experimental-features repo. To continue to use it, add the below to your packages.yml file:
packages:
- git: https://github.com/dbt-labs/dbt-labs-experimental-features
subdirectory: insert_by_period
revision: XXXX #optional but highly recommended. Provide a full git sha hash, e.g. 1c0bfacc49551b2e67d8579cf8ed459d68546e00. If not provided, uses the current HEAD.
Removal of deprecated legacy behavior:
safe_add()
only works with a list of arguments; use{{ dbt_utils.safe_add(['column_1', 'column_2']) }}
instead of varargs{{ dbt_utils.safe_add('column_1', 'column_2') }}
.- Several long-promised deprecations to
deduplicate()
have been applied:- The
group_by
argument is replaced bypartition_by
. relation_alias
is removed. If you need an alias, you can pass it directly to therelation
argument.order_by
is now mandatory. Pass a static value like1
if you don’t care how they are deduplicated.
- The
- The deprecated
table
argument has been removed fromunpivot()
. Userelation
instead.
Resolving error messages
After upgrading, these are common error messages you may encounter, along with their resolutions.
dict object has no attribute MACRO_NAME
Cause: No macro called MACRO_NAME
exists. This is most likely because the macro has moved to the dbt
namespace (see above). It could also be because you haven't run dbt deps or have misspelled a macro's name.
Resolution: For cross-database macros, change dbt_utils.MACRO_NAME()
to dbt.MACRO_NAME()
.
macro 'dbt_macro__generate_surrogate_key' takes not more than 1 argument(s)
Cause: generate_surrogate_key()
requires a single argument containing a list of columns, not a set of varargs.
Resolution: Change to dbt_utils.generate_surrogate_key(['column_1', 'column_2'])
- note the square brackets.
The dbt_utils.surrogate_key has been replaced by dbt_utils.generate_surrogate_key
Cause: surrogate_key()
has been replaced.
Resolution:
- Decide whether you need to enable backwards compatibility as detailed above.
- Find and replace
dbt_utils.surrogate_key
withdbt_utils.generate_surrogate_key
.
macro dbt_macro__test_expression_is_true takes no keyword argument condition
Cause: condition
has been removed from the expression_is_true
test, now that where
is available on all tests automatically.
Resolution: Replace condition
with where
.
No materialization insert_by_period was found for adapter
Cause: insert_by_period
has moved to the experimental features repo (see above).
Resolution: Install the package as described above.
dbt found two tests with the name "XXX".
Cause: Changing from condition
to where
in the expression_is_true
test, as configs are not part of a test's unique name.
Resolution: Define a custom name for your test.